On a Construction of Friedman 

Jeffrey Shallit* and Ming-wei Wang 
§ ; Department of Computer Science 

University of Waterloo 
qJ Waterloo, Ontario, Canada N2L 3G1 

^ ■ shallit@graceland.uwaterToo. ca 

oo ! m2wang@math . uwaterloo . ca 

q : February 1, 2008 

U' 

Abstract 

H. Friedman obtained remarkable results about the longest finite sequence x such 
that for all i ^ j the word x[i..2i] is not a subsequence of x\j..2j]. In this note we 
J^ ■ consider what happens when "subsequence" is replaced by "subword" . 

o 

1 Introduction 

o 

We say a word y is a subsequence of a word z if y can be obtained by striking out or more 
symbols from z. For example, "iron" is a subsequence of "introduction". We say a word y 
is a subword of a word z if there exist words w,x such that z = wyx. For example, "duct" 
is a subword of "introduction" .0 

We use the notation x[k] to denote the fc'th letter chosen from the string x. (The first 
letter of a string is x[l].) We write x[a..6] to denote the subword of x of length b — a + 1 
starting at position a and ending at position b. 

Recently H. Friedman has found a remarkable construction that generates extremely 
large numbers [|T], 0]. Namely, consider words over a finite alphabet S of cardinality k. If an 
infinite word x has the property that for all i,j with < i < j the subword x[i..2i] is not a 
subsequence of x[j..2j], call it self-avoiding. We apply the same definition for a finite word 
x of length n, imposing the additional restriction that j < n/2. 

Friedman shows there are no infinite self-avoiding words over a finite alphabet. Fur- 
thermore, he shows that for each k there exists a longest finite self-avoiding word x over 
an alphabet of size k. Call n(k) the length of such a word. Then clearly n(l) = 3 and 
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1 Europeans sometimes use the term "factor" for what we have called "subword", and they use the term 
"subword" for what we have called "subsequence" . 



a simple argument shows that n(2) = 11. Friedman shows that n(3) is greater than the 
incomprehensibly large number A 7198 (158386), where A is the Ackermann function. 

Jean- Paul Allouche asked what happens when "subsequence" is replaced by "subword" . 
A priori we do not expect results as strange as Friedman's, since there are no infinite anti- 
chains for the partial order defined by "x is a subsequence of y" , while there are infinite 
anti-chains for the partial order defined by "x is a subword of y" . 

2 Main Results 

If an infinite word x has the property that for all i,j with < % < j the subword x[i..2i] is 
not a subword of x[j..2j], we call it weakly self-avoiding. If x is a finite word of length n, we 
apply the same definition with the additional restriction that j < n/2. 

Theorem 1 Let E = {0, 1, . . . , k - 1}. 

(a) If k = 1, the longest weakly self-avoiding word is of length 3, namely 000. 

(b) If k = 2, there are no weakly self-avoiding words of length > 13. There are 8 longest 
weakly self-avoiding words, namely 0010111111010, 0010111111011, 0011110101010, 
0011110101011 and the four words obtained by changing to 1 and 1 to 0. 

(c) If k = 3, there exists an infinite weakly self-avoiding word. 

Proof. 

(a) If a word x over S = {0} is of length > 4, then it must contain 0000 as a prefix. Then 
x[1..2] = 00 is a subword of x[2.A\ = 000. 

(b) To prove this result, we create a tree whose root is labeled with e, the empty word. If 
a node's label x is weakly self-avoiding, then it has two children labeled xO and xl. This tree 
is finite if and only if there is a longest weakly self-avoiding word. In this case, the leaves 
of the tree represent non-weakly-self-avoiding words that are minimal in the sense that any 
proper prefix is weakly self- avoiding. 

Now we use a classical breadth-first tree traversal technique, as follows: We maintain 
a queue, Q, and initialize it with the empty word e. If the queue is empty, we are done. 
Otherwise, we pop the first element q from the queue and check to see if it is weakly self- 
avoiding. If not, the node is a leaf, and we print it out. If q is weakly self- avoiding then we 
append g0 and ql to the end of the queue. 

If this algorithm terminates, we have proved that there is a longest weakly self-avoiding 
word. The proof may be concisely represented by listing the leaves in breadth-first order. 
We may shorten the tree by assuming, without loss of generality, that the root is labeled 0. 

When we perform this procedure, we obtain a tree with 92 leaves, whose longest label is 
of length 14. The following list describes this tree: 
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00111100 


0011010101 


001011111011 


0001 


00111110 


0011010110 


001011111100 


0101 


00111111 


0011010111 


001011111110 


001000 


01000000 


0011101000 


001011111111 


001001 


01000001 


0011101001 


001110101000 


001010 


01000010 


0011101011 


001110101001 


001100 


01000011 
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001110101010 


010001 


01100001 


0011110110 
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010010 


01100010 
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010011 


01100011 


0110000000 


001111010110 


011001 
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0110000001 


001111010111 


011010 


01110010 


0110000010 


011100000000 


011011 


01110011 


0110000011 


011100000001 


011101 


0010110100 


0111000001 


011100000010 


011110 


0010110101 


0111000010 


011100000011 


011111 


0010110110 


0111000011 


00101111110100 


00101100 


0010110111 


001011110100 


00101111110101 


00110100 


0010111000 


001011110101 


00101111110110 


00110110 


0010111001 


001011110110 


00101111110111 


00110111 


0010111010 


001011110111 


00111101010100 


00111000 


0010111011 


001011111000 


00111101010101 


00111001 


0010111100 


001011111001 


00111101010110 


00111011 


0011010100 


001011111010 


00111101010111 



Figure 1: Leaves of the tree giving a proof of Theorem |T| (b) 



(c) Consider the word 



x = 22010110111011111011111110111111111110 - ■ ■ 

= 2 2 1 l 2 l 3 l 5 l 7 l 11 l 15 l 23 l 31 l 47 



where there are 0's in positions 3, 5, 8, 12, 18, 26, 38, 54, 78, 110, 158, .... More precisely, de- 
fine J2n+i = 5 ■ 2 n — 2 for n > 0, and fin — 7 ■ 2™" 1 — 2 for n > 1. Then x has 0's only in the 
positions given by f\ for i > 1. 

First we claim that if i > 3, then any subword of the form x[i..2i] contains exactly two 
0's. This is easily verified for i = 3. If 5 • 2 n — 1 < i < 7 ■ 2 n — 1 and n > 0, then there are 
0's at positions 7 • 2 n - 2 and 5 • 2 n+1 - 2. (The next is at position 7 • 2 n+1 - 2, which is 
> 2(7 • 2 n - 2).) On the other hand, if 7 • T 1 ' 1 - 1 < % < 5 • 2 n - 1 for n > 1, then there 
are 0's at positions 5 • 2 n — 2 and 7 • 2 n — 2. (The next is at position 5 • 2 n+1 — 2, which is 
>2-(5-2 n -2).) 

Now we prove that x is weakly self-avoiding. Clearly x[1..2] = 22 is not a subword of 
any subword of the form x[j..2j] for any j > 2. Similarly, x[2..4] = 201 is not a subword 
of any subword of the form x[j'..2j] for any j > 3. Now consider subwords of the form 
t := x[i..2i] and t' := x[j..2j] for i, j > 3 and i < j. From above we know t = POFOl™, and 



t' = l u Ol 1 " 01™ . For t to be a subword of t' we must have u < u', v = v', and w < w'. But 
since the blocks of l's in x are distinct in size, this means that the middle block of l's in t 
and t' must occur in the same positions of x. Then u <u' implies i > j, a contradiction. ■ 

3 Another construction 

Friedman also has considered variations on his construction, such as the following: let M 2 (n) 
denote the length of the longest finite word x over {0, 1} such that x[i..2i] is not a subsequence 
of x[j..2j] for n < i < j. We can again consider this where "subsequence" is replaced by 
"subword" . 

Theorem 2 There exists an infinite word x over {0, 1} such that x.[i..2i] is not a subword 
of yi[j..2j] for all i,j with 2 < i < j . 

Proof. Let 

x = 001001 3 01 2 01 7 01 5 01 15 01 n 01 31 01 23 ••• 
= 1 l gi I 92 l 93 ■ ■ • 

where g\ — 3, g-x — 2, and g n = 2g n _2 + 1 for n > 3. Then a proof similar to that above 
shows that every subword of the form x[i..2i] contains exactly two 0's, and hence, since the 
Qi are all distinct, we have x[i..2i] is not a subword of x[j..2j] for j > % > 1. ■ 
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